Parkinson’s disease-associated shifts between DNA methylation and DNA hydroxymethylation in human brain in PD-related genes, including PARK19 (DNAJC6) and PTPRN2 (IA-2β)

Background The majority of Parkinson’s disease (PD) cases are due to a complex interaction between aging, genetics, and environmental factors; epigenetic mechanisms are thought to act as important mediators of these risk factors. While multiple studies to date have explored the role of DNA modifications in PD, few focus on 5-hydroxymethylcytosine (5hmC). Because 5hmC occurs at its highest levels in the brain and is thought to be particularly important in the central nervous system, particularly in the response to neurotoxicants, it is important to explore the potential role of 5hmC in PD. This study expands on our previously published epigenome-wide association study (EWAS) performed on DNA isolated from neuron-enriched nuclei from human postmortem parietal cortex from the Banner Sun Health Research Institute Brain Bank. The study aimed to identify paired changes in 5hmC and 5mC in PD in enriched neuronal nuclei isolated from PD post-mortem parietal cortex and age- and sex-matched controls. We performed oxidative bisulfite (oxBS) conversion and paired it with our previously published bisulfite (BS)-based EWAS on the same samples to identify cytosines with significant shifts between these two related epigenetic marks. Interaction differentially modified cytosines (iDMCs) were identified using our recently published mixed-effects model for co-analyzing βmC and βhmC data. Results We identified 1,030 iDMCs with paired changes in 5mC and 5hmC (FDR < 0.05) that map to 695 genes, including PARK19 (DNAJC6), a familial PD gene, and PTPRN2 (IA-2), which has been previously implicated in PD in both epigenetic and mechanistic studies. The majority of iDMC-containing genes have not previously been implicated in PD and were not identified in our previous BS-based EWAS. Conclusions These data potentially link epigenetic regulation of the PARK19 and PTPRN2 loci in the pathogenesis of idiopathic PD. In addition, iDMC-containing genes have known functions in synaptic formation and function, cell cycle and senescence, neuroinflammation, and epigenetic regulation. These data suggest that there are significant shifts between 5mC and 5hmC associated with PD in genes relevant to PD pathogenesis that are not captured by analyzing BS-based data alone or by analyzing each mark as a distinct dataset.

Recently, studies have begun to explore links between 5hmC and PD (44)(45)(46).First, rare variants in TET1 were associated with increased risk of PD in a Chinese PD cohort (44).Second, a targeted analysis of DNA modi cations within known enhancers in human postmortem prefrontal cortex identi ed epigenetic disruption of an enhancer targeting the TET2 gene in PD patients (46).This study also performed hydroxymethylated DNA immunoprecipitation-sequencing (hMeDIP-Seq) and found that PD-associatedhydroxymethylated peaks were enriched in gene bodies, promoters, and enhancers.Third, a small study in human postmortem substantia nigra (SN) used hMe-Seal, a selective chemical labeling method, and identi ed thousands of differentially hydroxymethylated regions in genes related to CNS and neuronal differentiation, neurogenesis, and development and maintenance of neurites and axons, although the widespread neurodegeneration in the SN by the time of PD diagnosis complicates interpretation of these results (45).Regardless, taken together, these initial studies support a role for 5hmC in regulation of expression of genes important for PD pathogenesis and indicate that additional research is warranted.
In our previous study, we performed a neuron-speci c epigenome-wide association study (EWAS) with the Illumina EPIC BeadChip array paired with BS conversion using enriched neuronal nuclei from human postmortem parietal cortex obtained from the Banner Sun Health Research Institute Brain Bank (47).We identi ed largely sex-speci c PD-associated changes in DNA modi cation in 434 unique genes, including genes previously implicated in PD, including PARK7 (DJ-1), SLC17A6 (VGLUT2), PTPRN2 (IA-2β), and NR4A2 (NURR-1), as well as genes involved in developmental pathways, neurotransmitter packaging and release, and axon/neuron projection guidance.However, we could not differentiate between 5mC and 5hmC because we used BS.
Here, we report the results of an epigenome-wide analysis of 5hmC and 5mC in enriched neurons from PD brain using our recently proposed method for reconciling base-pair resolution 5mC and 5hmC data (48)(49)(50).This method considers 5mC and 5hmC as paired data since these marks are biologically and statistically dependent on each other.We utilized additional DNA isolated from the same neuronenriched samples and performed oxidative BS (oxBS) conversion paired with the Illumina EPIC BeadChip array to speci cally measure 5mC (51,52).oxBS adds an oxidation step with potassium perruthenate (KRuO 4 ) that speci cally oxidizes 5hmC, forming 5-formylcytosine, prior to BS conversion.BS then deaminates 5mC only, but not 5-formylcytosine, such that C and 5hmC are read as thymine, while only the 5mC is read as cytosine, providing a readout of "true" methylation.Comparison of BS and oxBS results allows estimation of 5hmC.To our knowledge, this is the largest epigenome-wide analysis of 5hmC to date in neurons enriched from PD post-mortem brain at single base pair resolution.

Human brain tissue
De-identi ed tissue samples from control (n = 50) and sPD (n = 50) human brain samples were obtained from archival human autopsy specimens provided by the Banner Sun Health Research Institute (BSHRI), using BSHRI's approved institutional review board (IRB) protocols.Further details about the BSHRI's brain samples and sample selection are available in a previous publication (53).We selected PD patients with mid-stage disease (Braak stage = II-III), as de ned by Lewy pathology (54).Postmortem parietal cortex was selected because this region develops pathology in the late stages of PD and is expected to still have robust populations of neurons from which disease-associated, pre-degenerative gene regulatory marks can be measured.

Magnetic-activated cell sorting
DNA samples previously isolated from an enriched neuronal population from de-identi ed parietal cortex samples from the BSHRI Brain Bank were used for this study.NeuN-positive (NeuN + ) nuclei were enriched from 100 mg of ash-frozen parietal cortex tissue using a two-stage magnetic-assisted cell sorting (MACS) method as previously described (47).

DNA extraction
DNA was isolated from enriched NeuN + nuclei using the Qiagen QIAamp DNA Micro Kit (Cat.# 56304) as previously described (47).

Oxidative bisul te treatment and EPIC arrays
Intact genomic DNA yield was quanti ed by Qubit uorometry (Life Technologies).oxBS conversion was performed on 1 µg genomic DNA using the TrueMethyl Array kit (Cambridge Epigenetix).While the recommended amount of 500 ng DNA was su cient for BS conversions, this was insu cient for oxBS, likely due to the additional harsh oxidation step.Cleanup and preparation of DNA, and all steps for the EPIC bead chip protocol were performed as previously described per the manufacturer's protocol (47).
We continued with male samples only due to small sample size of the remaining female samples.Samples with estimated glial cell proportion > 0.50 were also removed from analysis.This removed one sample from the male data set, leaving 56 males (27 PD,29 control).Data for the included samples are summarized in Table 1 and full metadata is in Supplementary File 3.

Differential testing for individual CpG sites
The gamlss R package (version 5.4-22) was used to test for interaction differentially methylated cytosines (iDMCs) as previously described (Supplementary File 2) (47,48,64,65).All models included glial cell proportion estimated from BS data and PMI as covariates ( 47).An FDR < 0.05 was used as the cutoff for signi cance and annotation of signi cant differential probes was performed using the Illumina EPIC array manifests.P-value histograms generated to con rm accuracy of modeling show that the interaction model provided more appropriate histograms (uniform with an overabundance of low pvalues), whereas the separate models produced histograms with an overabundance of high p-values (Supplementary Fig. 1) (66).For separate analyses of β mC and β hmC , neither normal nor beta regression modeling accurately modeled the data.All code for data visualizations is provided in Supplementary Files 6,7.

Annotation of interaction DMCs
Gene IDs corresponding to each iDMC were extracted from the EPIC array manifest provided by Illumina (v1.0 B5).Annotation of universal chromatin states was also performed using the annotatr R package (version 1.28.0) and adding custom full stack ChromHMM chromatin states for hg38 to the annotation cache (annotatr citation) (67, 68).Since this study utilized neuronally enriched nuclei, neuronal expression of iDMC-containing genes was veri ed using the Allen Brain Cell Atlas (69).The Harmonize and GeneCards databases were used to explore gene function (70,71) Gene ontology pathway enrichment and protein-protein interaction networks Gene ontology (GO) term enrichment testing and pathway analysis was performed on genes annotated to iDMCs identi ed in the interaction model using the ClueGO application in Cytoscape (version 3.10.1)(72,73)."Groups" was selected as the visual style, and the GO biological process (GOBP) term was selected.Network speci city was set to "Medium", with the GO Tree Interval minimum set at 3 and maximum at 8.Only terms with at least 3 genes and a Bonferroni-corrected p-value < 0.05 were included in pathway visualizations.The connectivity score (Kappa) was set at 0.4, and default GO Term Grouping settings were used in all analyses.The genes found in enriched GOBP terms by ClueGO were used for protein-protein interaction network analysis using STRING (version 12.0) (74).STRING network analysis was performed using default parameters, including a minimum required interaction score = 0.9 (Fig. 3) or 0.7 (Supplementary Fig. 2).

Reanalysis of BS data
Data from our previous publication were reanalyzed to include intergenic loci to utilize updates to the EPIC manifest and include chromatin state annotation as performed here (47).This was carried out as previously described with the following modi cations.The nal ltering step to remove probes without genic annotations based on the Illumina EPIC array manifest was removed so that all probes that passed QC and ltering steps were included in the differential methylation analysis.Differentially methylated probes and differentially methylated regions were annotated to chromatin states by using the annotatr R package (version 1.28.0) and adding custom full stack ChromHMM chromatin states for hg38 to the annotation cache (67, 68).Most of the DMCs, DMRs, and genes identi ed overlap with the results from our previous publication with some additions (Table 2).Annotated DMCs and DMRs identi ed here are included in Supplementary Files 9 and 10.Code for QC was run as previously written and modi ed code for the differential methylation analysis is provided in Supplementary File 8.

Results
We identi ed 1,030 iDMCs with signi cant shifts in the proportions of 5mC and 5hmC associated with PD in DNA isolated from an enriched neuronal population derived from parietal cortex (FDR < 0.05) (Fig. 1C; Supplementary File 4).More iDMCs (652) have a negative than a positive beta coe cient (378).
Consistent with known distribution of 5hmC, the majority of iDMCs were found in gene bodies (Fig. 1D).
Chromatin state annotation shows that iDMCs are located mainly in transcriptionally active chromatin, consistent with their location within gene bodies, as well as quiescent chromatin, polycomb repressed and open chromatin, active enhances (Fig. 1E).
In this output, a positive beta coe cient (interaction term) indicates a relative decrease in 5mC and an increase in 5hmC in PD brains, as illustrated by the most signi cant iDMC with a positive beta coe cient (Fig. 2A) and the DMC with the most positive beta coe cient (Fig. 2C).In contrast, a negative interaction term indicates a relative increase in 5mC and a decrease in 5hmC in PD brains, as illustrated by the most signi cant DMC with a negative beta (Fig. 2B) coe cient and the most negative beta coe cient (Fig. 2D).Corresponding raw BS β-values for each iDMCs demonstrate that these probes were not identi ed by BS data alone (Fig. 1F-J).Of these 1,030 iDMCs, 681 are found within genes and annotate to 695 genes, including the familial PD gene PARK19 (DNAJC6) and PTPRN2, a gene previously implicated in PD and identi ed in multiple EWAS studies (Fig. 2E,F; Supplementary Files 5,12).Only 49 of these genes were also identi ed in our BS-only EWAS, but at different cytosines (Supplementary File 11) (47).
Of the 695 genes, 375 annotated to 16 enriched GO terms in 7 GO term groups, which are enriched for genes involved in nervous system development, cell-cell adhesion, regulation of signal transduction and cell communication, morphogenesis, and cell differentiation (Fig. 3A; Table 3).Genes within the enriched GO terms were used for STRING network analysis to identify potential functional interactions between proteins encoded by differentially modi ed genes.All but 1 gene mapped to the STRING database (RTEL1-TNFRSF6B).For clarity, networks of proteins within enriched GO terms with the highest stringency settings are shown in Fig. 3B.Less stringent networks for each individual GO term group are shown in Supplementary Fig. 2 to highlight the high degree of potential connectivity within each group.
While there are major differences in existing PD EWAS related to sample size, sample selection, brain region assessed, methods used to measure DNA modi cations, and statistical modeling, as well as inconsistent reporting between studies, we compared the list of genes in this study to recent brainspeci c EWAS studies for PD, including ours, for which data was provided (36, 37,40,46).One study performed hMe-DIP in prefrontal cortex to assess genome-wide 5hmC (46).In addition, three BS-based studies used either the EPIC array or the previous 450K array in SN, frontal cortex, the dorsal motor nucleus of the vagus (DMV), and cingulate gyrus (CG) (36, 37, 40).Not surprisingly, given the differences between these studies, overlap between our data and these studies was minimal (Table 4, Supplementary File 12).The gene most consistently identi ed in these studies is PTPRN2, which we previously reported as showing sexually dimorphic alterations in DNA modi cations and has previously been implicated in PD (47,(75)(76)(77).The methodological differences between these studies complicate the comparison of these studies and underscore the need for rigorous and reproducible methodology and analysis, as discussed in our recent chapter on rigor and reproducibility in EWAS (78).
Table 3 Enriched GO Terms and GO Term Groups of interaction DMC-containing genes.Results of ClueGO gene ontology enrichment analysis and signi cant GO terms are shown (p < 0.05).GO terms are grouped when they share > 50% of their genes."Associated Genes" shows all interaction DMC containing genes that map to each GO Term Group.

Discussion
Signi cant shifts between 5mC and 5hmC associated with PD Here, we performed an integrated genome-wide analysis of 5mC and 5hmC using our novel application of mixed effects modeling in enriched neuronal nuclei from PD post-mortem parietal cortex samples (48-50) These PD-associated iDMCs were largely unique from DMCs identi ed in our previous BS-based EWAS: 49 genes were identi ed in both studies, 646 only in the paired analysis, and 498 genes identi ed only in the BS-only analysis (Supplementary Files 5,9-11) (47).Collectively, these data suggest that there are signi cant PD-associated shifts between 5mC and 5hmC at iDMCs that are not captured by analyzing BS-based data alone or by analyzing each mark as distinct datasets (Fig. 1C) (47,48).Because many of these iDMCs are located within genes previously implicated in PD and in genes within pathways thought to play a role in PD, these data suggest that shifts in the balance between DNA modi cations may play an important but unrecognized role in PD etiology in both known and novel PD-related genes.
An important caveat of these ndings is that this study does not address the biological signi cance of these epigenetic shifts.While shifts between 5mC and 5hmC may potentially impact the binding of proteins that regulate gene expression and/or other epigenetic marks, this study does not examine these functional impacts; however, it does provide multiple avenues for further study of the impact of these changes on gene expression, alternate promoter usage, differential isoform expression, and neuronal function and susceptibility.

Epigenetic regulation of the familial PARK19 (DNAJC6) PD gene
As with our previous study, we identi ed epigenetic changes in known PD genes.Here, we identi ed an iDMC within the 3′ UTR of PARK19, which encodes DNAJC6.This iDMC shows increased 5mC and decreased 5hmC in PD and is located in a region annotated by universal chromatin state annotation as a transcribed enhancer (Fig. 2E; Supplementary File 5) (67).Autosomal recessive mutations in PARK19 predominantly cause juvenile-or early-onset PD, with atypical signs and symptoms characterized by poor L-DOPA responsiveness, pyramidal signs, dystonia, rapid disease course, mental retardation, and seizures and is one of multiple DNAJC genes associated with familial forms of PD (2, 5, 79-83).DNAJC6 works with multiple other genes linked to familial and sporadic PD, including PARK20 (SYNJ1), PARK8 (LRRK2), PARK2 (PRKN), SNCA (α-synuclein), and other DNAJC proteins (DNAJC26, RAK; DNAJC13, REM8), to regulate key steps in synaptic vesicle tra cking, synaptic vesicle endocytosis, and clathrindynamics in endocytosis (83-87).While mutations in this gene have only been associated with rare familial forms of PD, these data potentially link epigenetic regulation of this locus with the pathogenesis of sPD and are consistent with previous ndings from our lab and others demonstrating epigenetic changes in familial and GWAS PD genes in brain tissue from sPD cases (12,(38)(39)(40)(41)47).

Epigenetic regulation of PTPRN2
The gene most consistently identi ed in this and other EWAS studies is PTPRN2 (Fig. 2F; Supplementary File 12).We previously reported sexually dimorphic alterations in DNA modi cations within this gene, and all but one of the previous EWAS in our comparison also identi ed altered DNA modi cations within this gene (36,37,40,47).In our previous study, we found multiple DMCs and one DMR annotated to the PTPRN2 gene that showed exact, complete overlap in both male and female samples and was hypermethylated in male samples but hypomethylated in female samples (47).PTPRN2 encodes the protein tyrosine phosphatase receptor type N2 (IA-2β), which is a major autoantigen in Type 1 diabetes and is expressed on dense core and synaptic vesicles.DNA modi cations within this locus have also been associated with Type 2 diabetes and pesticide exposures (88, 89).In addition to its well-established role in insulin release and diabetes, multiple studies demonstrate a role for this gene in the release of monoamines (dopamine, norepinephrine, and serotonin) in the brain that lead to de cits in learning and memory, and motor behavior (75,(90)(91)(92).Speci cally relevant to PD, a hypomethylated DMC within PTPRN2 in whole blood was associated with faster motor progression (77).In addition, decreased expression of PTPRN2 has also been observed in the SN of PD patients, and increased expression has been seen in DA neurons derived from PD patients with LRRK2 G2019S mutations (75,76).Mechanistically, there are multiple non-coding transcripts, alternate protein-coding transcriptions, and splice variants encoded by the PTPRN2 locus, and expression of PTRPRN2 is regulated by multiple miRNAs (93,94).Thus, epigenetic regulation of this locus could potentially lead to altered transcript usage or differential regulation by miRNAs that affect monoaminergic signaling.
PD-associated interaction DMCs are enriched in pathways related to synaptic formation and function, senescence, neuroin ammation, and epigenetic regulation Within the 695 iDMC-containing genes, our functional annotation found that these gene products function within speci c pathways related to neuronal dysfunction in PD (Fig. 2, Table 2).These pathways are described here with STRING subnetworks in Fig. 2B indicated by their numbers; expanded connections with additional iDMC-containing genes are shown in Supplementary Fig. 2.
Multiple iDMC-containing genes encode proteins that converge on functions related to proper development and function of synapses, including synaptogenesis, synaptic plasticity, neurotransmitter release, and clathrin-mediated endocytosis (networks 2,4,7,8,9,10,13,14) (Fig. 2B).In addition, a set of genes play roles in the myelination and maintenance of myelin (networks 3,5), as well as in the synthesis of dopamine, serotonin, and norepinephrine (network 4) (Fig. 2B).Together, this suggests the regulation of genes related to the formation of synapses and the maintenance of synaptic and vesicular integrity are disrupted prior to the onset of pathology within the parietal cortex.This is also consistent with ndings in our mouse model of increased PD susceptibility, showing that exposure to the organochlorine pesticide dieldrin is associated with epigenetic changes in similar key neurodevelopmental pathways across the lifespan (95).Together, these pathways coordinate the formation, structure, function, and plasticity of synapses, consistent with the idea that the integrity of synapses is critical for proper neurotransmission and that multiple PD-related mechanisms converge on the disruption of synaptic function (96-99).
A large body of evidence now supports a role of neuroin ammatory pathways in PD (100-102).Here, we identi ed multiple iDMC-containing genes important in the neuronal response to environmental stressors and neuroin ammatory signals, including multiple genes within the NFκB signaling pathway (networks 3,11) (Fig. 2B).Altered epigenetic regulation of these genes could lead to impropriate activation of these pathways in response to cellular and environmental stressors.We also identi ed genes involved in cell cycle and senescence pathways (1,12), consistent with evidence that neuronal cell cycle reentry events and cellular senescence in the aging brain can lead to degeneration (103-105).Finally, we observed epigenetic regulation of multiple epigenetic regulators themselves (network 6) (Fig. 2B).
While these changes were not assessed in the SN, the region most commonly studied in the context of PD, by using the parietal cortex, which does not have widespread degeneration, we were able to assess neuron-speci c DNA modi cations associated with PD in a region without widespread neuron loss prior to the onset of pathology within the parietal cortex.Despite this, overall, these data suggest that PDassociated alterations in the epigenetic regulation of these genes may alter gene expression, promoter usage, or isoform expression of these genes and may represent early events that precede the onset of degeneration.

Technical limitations of the oxBS method
In addition, it is also important to note that this included a high level of failed probes and samples that was unique to the oxBS reactions.In contrast, no samples were excluded due to high levels of failed probes, and far fewer probes failed from the BS data.As noted in the methods, we started with 1 µg of input DNA for oxBS because when we used the recommended starting amount of 500 ng, all oxBS probes failed.Together, this suggests that oxBS is much harsher on the DNA, requiring high input amounts that may limit the utility of this method as the eld moves towards cell type-speci c methods and lower input amounts.

Conclusions
These results draw a possible link between epigenetic regulation of the PARK19 and PTPRN2 loci and the pathogenesis of sPD.Additional iDMC-containing genes play critical roles in synaptic formation and function, cell cycle and senescence, neuroin ammation, and epigenetic regulation.This is also consistent with ndings in our mouse model of increased PD susceptibility, showing that exposure to the organochlorine pesticide dieldrin is associated with epigenetic changes in similar pathways across the lifespan (95).Together, these pathways coordinate the formation, structure, function, and plasticity of synapses, consistent with the idea that the integrity of synapses is critical for proper neurotransmission and that multiple PD-related mechanisms converge on the disruption of synaptic function (96-99).In addition, most of these genes were not identi ed by a BS-only analysis on the same samples, suggesting that there are signi cant shifts between 5mC and 5hmC associated with PD in genes relevant to PD pathogenesis that are not captured by analyzing BS-based data alone or by analyzing each mark as a distinct dataset.for the most signi cant iDMC with a positive interaction term (A), the most signi cant iDMC with a negative interaction term (B), the iDMC with the most positive interaction term (C), the iDMC with the most negative interaction term (D) and the iDMC annotated to the PD gene PARK19/DNAJC6 (E) and to PTRPN2 (F).The corresponding beta coe cient/interaction term and FDR value are indicated for each iDMC.for the most signi cant iDMC with a positive interaction term (A), the most signi cant iDMC with a negative interaction term (B), the iDMC with the most positive interaction term (C), the iDMC with the most negative interaction term (D) and the iDMC annotated to the PD gene PARK19/DNAJC6 (E) and to

Figures Figure 1
Figures

Table 1
values (β hmC ) were estimated by pairing oxBS β values with our previously published BS data using the maximum likelihood estimate function (oxBS.MLE) from the ENmix package, which returns true methylation β values (β mC ) and estimates β hmC .Raw BS and oxBS values, as well as MLE-corrected β mC and β hmC values, are shown in Fig.1A,B.After MLE, probes with mean β mC or β hmC < 0.01 across all samples were removed due to increased variability and decreased interpretability of β values at such low levels, as well as to remove the issue of zero in ation for 5hmC β values.We then produced three data sets: one with all probes where β mC > 0.01 (714,199 probes), one with all probes where β hmC > 0.01 (588,147), and one with only probes common to the two sets, where β mC > 0.01 and β hmC > 0.01 (588,123).Of these, only 12 probes had β hmC data only, and 126,076 had β mC data only.

Table 2
Comparison of published and reanalyzed results of BS-only data.

Table 4
Summary of comparisons between the current study and previous PD EWAS in brain tissue.*Sex not speci ed, Abbreviations: SN, substantia nigra; PFC, prefrontal cortex; CTX, cortex; FC, frontal cortex; DMV, dorsal motor nucleus of the vagus; CG, cingulate gyrus; PC, parietal cortex